This study aims to discover metabolic alterations in hyperinsulinaemic androgen excess (HIAE), which is the phenotype that results from the pathology known as polycystic ovary syndrome (PCOS), in order to understand health risks as cardiovascular disease and Diabetes Type 2 (T2D) in adulthood. For that purpose, serum samples from young thin (non-obese) girls with a similar body mass index and diagnosed with HIAE and from a control group of 14 healthy girls with similar weight, age and same ethnicity were compared prior and after a polytherapy (low-dose combination of pioglitazone, flutamide and metformin (PioFluMet). These serum samples of HIAE and control girls were profiled using H-NMR and LC-ESI-QTOF, acquiring all the time MS1 mass spectra. It was used an untargeted metabolomic approach because it wasn’t known a priori which metabolic events would occur due to the polytherapy. This untargeted approach was useful to know which metabolic pathways were involved in the polytherapy. Afterward, it was done a triple-quadrupole (QqQ) targeted analysis in MRM mode (targeted MS/MS experiments) in order to measure all the metabolites that were studied in the untargeted approach and be able to identify which metabolites were dysregulated. The results showed that girls with HIAE presented metabolic and endocrine alterations (such as increased levels of insulin and testosterone), increased levels of of VLDL, small LDL and large LDL, decreased levels of HDL and lower levels of methionine associated with greater levels of methionine sulfoxide. Finally, after 18 months of the early treatment with a low-dose combination of PioFluMet, HIAE girls presented metabolic changes, such as a reduction in insulin concentrations. Therefore, they improved the levels of oxidative stress markers and the lipoprotein profile and experienced a revert in hyperandrogenism and hyperinsulinemia restoring them to similar levels found in healthy girls.
#Loading libraries. To include it in the Packages.
#Loading xcms
library("xcms")
#Loading RColorBrewer
library("RColorBrewer")
#Loading ggplot2
library("ggplot2")
#Loading magrittr
library("magrittr")
#Loading plotly
library("plotly")
#Loading DT
library("DT")
load("TaskSession8.Rdata")
path.mzML <- "C:/Users/Bernat/Desktop/UNI/3r 2010-2021/Omicas/Practicas/S8-20201119-practicum-20201119/Task"
mzML.files <- list.files(path.mzML, pattern= ".mzML",
recursive=T, full.names = T)
pheno <- phenoDataFromPaths(mzML.files)
pheno$sample_name <- rownames(pheno)
pheno$sample_group <- pheno$class
The phenoDataFromPaths function builds a data.frame representing the experimental design from the folder structure in which the files of the experiment are located. Each row is a serum sample.
raw_data <- readMSData(files = mzML.files,
pdata = new("NAnnotatedDataFrame", pheno),
mode = "onDisk")
mode = “onDisk”: reads only spectrum header from files, but no data which is retrieved on demand allowing to handle very large experiments with high memory demand. It does not upload it to RAM, only when I tell it to (on demand). It reads only the metadata, not the mass specs, which is what it weighs.
In order to see the entire chromatographic run time we plot TIC:
## We define colors according to experimental groups
group_colors <- paste0(brewer.pal(length(levels(pData(raw_data)$sample_group)),
"Dark2"), "60")
names(group_colors) <- levels(pData(raw_data)$sample_group)
## We plot TIC using chromatogram() function
## raw_data is the s4 object
TICs <- chromatogram(raw_data, aggregationFun = "sum")
# Plot of TIC according to experimental groups
plot(TICs, col = group_colors[raw_data$sample_group], main = "TIC")
We note the entire chromatographic run takes 600 seconds, that is to say, 10 minutes.
table(pheno$sample_group)
##
## CTR DIA_0 DIA_18 PIO_0 PIO_18 QC
## 14 5 6 6 6 8
As it is shown, there are 6 experimental groups: CTR, DIA_0, DIA_18, PIO_0, PIO_18 and QC, with 14, 5, 6, 6, 6, and 8 samples respectively.
CentWaveParam()
#it tell us the default parameters (already defined)
We must modify these parameters according what it is described in the on-line xcms parameters We define xcms online parameters
cw_onlinexcms_default <- CentWaveParam(ppm = 15, peakwidth = c(5, 20),
snthresh = 6, prefilter = c(3, 100),
mzCenterFun = "wMean", integrate = 1L,
mzdiff = 0.01, fitgauss = FALSE,
noise = 0, verboseColumns = FALSE,
roiList = list(), firstBaselineCheck = TRUE,
roiScales = numeric())
xdata <- findChromPeaks(raw_data, param = cw_onlinexcms_default)
df2DT <- chromPeaks(xdata)
dim (df2DT)
## [1] 819946 11
We get the peaks it has found in the s4 object. We have detected 819946 peaks.
xdata
## MSn experiment data ("XCMSnExp")
## Object size in memory: 11.57 Mb
## - - - Spectra data - - -
## MS level(s): 1
## Number of spectra: 34582
## MSn retention times: 0:2 - 10:0 minutes
## - - - Processing information - - -
## Data loaded [Thu Dec 03 15:37:44 2020]
## MSnbase version: 2.14.2
## - - - Meta data - - -
## phenoData
## rowNames: CON_BASA_567795 CON_BASA_574488 ... QC8 (45 total)
## varLabels: class sample_name sample_group
## varMetadata: labelDescription
## Loaded from:
## [1] CON_BASA_567795.mzML... [45] QC8.mzML
## Use 'fileNames(.)' to see all files.
## protocolData: none
## featureData
## featureNames: F01.S001 F01.S002 ... F45.S768 (34582 total)
## fvarLabels: fileIdx spIdx ... spectrum (35 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## - - - xcms preprocessing - - -
## Chromatographic peak detection:
## method: centWave
## 1200481 peaks identified in 45 samples.
## On average 26677 chromatographic peaks per sample.
## Alignment/retention time adjustment:
## method: obiwarp
## Correspondence:
## method: chromatographic peak density
## 25417 features identified.
## Median mz range of features: 0.0059752
## Median rt range of features: 6.5615
## 380535 filled peaks (on average 8456.333 per sample).
## rt and m/z range of the L-methionine peak area
M <- 149.051049291 #monoisotopic weight of L-methionine
H <- 1.007276 #proton weight
MH <- M + H #theoretical weight = 150.0583
## L-methionine peak mz range width according to XCMS parameters
error <- cw_onlinexcms_default@ppm #errors in ppm
mmu_min <- MH-(error*MH/1e6) #proton weight of L-methionine +- a ppm error: 15 ppm
mmu_max <- MH+(error*MH/1e6)
mzr <- c(mmu_min, mmu_max) #m/z range
## L-methionine peak rt range width according to XCMS parameters
apex.Lmethionine <- 293 #retention time ~ 293 s
rt.window <- cw_onlinexcms_default@peakwidth[2] #max rt width in seconds
rtr <- c(apex.Lmethionine-rt.window, apex.Lmethionine+rt.window)
chr_Lmethionine <- chromatogram(raw_data, mz = mzr, rt = rtr)
plot(chr_Lmethionine, col = group_colors[chr_Lmethionine$sample_group], lwd = 2)
This is the Extracted Ion Chromatogram of L-methionine. It plots the chromatogram in a mass range and in a retention time range. We note that we have 45 peaks, one for each sample, and colored according to their experimental group. We are in a mass range that goes from a minimum mass range (mmu_min) of 150,0561 to a maximum mass range (mmu_max) of 150,0606. The retention time is 293 s, because when we know that a metabolite is present in the samples, we know which is its the retention time. We observe that the maximum intensity at the apex (maxo) is 80000 approximately.
spMS1.Lmethionine<- raw_data %>%
filterRt(rt = c(apex.Lmethionine, apex.Lmethionine+1)) %>%
filterFile(1) %>%
spectra
ggplotly(plot(spMS1.Lmethionine[[1]]))
There can be a little bit of movement of the RT from sample to sample. These minimums alignments can be corrected and we do use adjustRtime() wrapper function. We have different algorithms to adjust RT. We are using obiwarp. ObiwarpParam() warps the (full) data to a reference sample.
Settings for the alignment: We are going to use the online xcms parameters: profStep= 1 –> binSize = 1
xdata <- adjustRtime(xdata, param = ObiwarpParam(binSize = 1))
It uses sample number 23 as a sample against which you align the remaining ones. It is done for all the samples.
Correspondence group peaks or signals from the same ion across samples.
Setting on-line xcms default parameters for peak density using PeakDensityParam().
We define the online xcms peak density parameters: onlinexcms_pdp (go to online xcms and we use the parameters we see in the alignment section).
onlinexcms_pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
bw = 5,
minFraction = 0.5,
binSize = 0.015,
minSamples = 1,
maxFeatures = 100)
Performance of the correspondence analysis using optimized peak density settings on on-line xcms. In order to see how correspondence works we use groupChromPeaks().
xdata <- groupChromPeaks(xdata, param = onlinexcms_pdp)
## xdata contains my features.
Extracting the feature definitions as data frame. We can see the correspondence results with featureDefinitions().
feature.info <- as.data.frame(featureDefinitions(xdata))
dim(feature.info)
## [1] 25417 15
We have found 25417 features.
xdata
## MSn experiment data ("XCMSnExp")
## Object size in memory: 11.57 Mb
## - - - Spectra data - - -
## MS level(s): 1
## Number of spectra: 34582
## MSn retention times: 0:2 - 10:0 minutes
## - - - Processing information - - -
## Data loaded [Thu Dec 03 15:37:44 2020]
## MSnbase version: 2.14.2
## - - - Meta data - - -
## phenoData
## rowNames: CON_BASA_567795 CON_BASA_574488 ... QC8 (45 total)
## varLabels: class sample_name sample_group
## varMetadata: labelDescription
## Loaded from:
## [1] CON_BASA_567795.mzML... [45] QC8.mzML
## Use 'fileNames(.)' to see all files.
## protocolData: none
## featureData
## featureNames: F01.S001 F01.S002 ... F45.S768 (34582 total)
## fvarLabels: fileIdx spIdx ... spectrum (35 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## - - - xcms preprocessing - - -
## Chromatographic peak detection:
## method: centWave
## 1200481 peaks identified in 45 samples.
## On average 26677 chromatographic peaks per sample.
## Alignment/retention time adjustment:
## method: obiwarp
## Correspondence:
## method: chromatographic peak density
## 25417 features identified.
## Median mz range of features: 0.0059752
## Median rt range of features: 6.5615
## 380535 filled peaks (on average 8456.333 per sample).
I look for what is the index in this feature matrix, where is the feature that represents the L-methionine:
ix_Lmethionine_features <- which(feature.info$mzmed >= mzr[1] &
feature.info$mzmed <= mzr[2] &
feature.info$rtmed >= rtr[1] &
feature.info$rtmed <= rtr[2])
ix_Lmethionine_features
## [1] 803
It returns 803, so we can say that the feature that represents the L-methionine is the feature 803.
Lmethionine.features <- feature.info[ix_Lmethionine_features,]
datatable(Lmethionine.features[,-ncol(Lmethionine.features)]) %>%
formatRound(c("mzmed", "mzmin", "mzmax"), 4) %>%
formatRound(c("rtmed", "rtmax", "rtmin"),0)
Lmethionine.features finds the feature with the index 803: the feature FT00803. This feature has 45 peaks (npeaks): 14 peaks in the CTR group, 5 peaks in the DIA_0 group, 6 peaks in the DIA_18 group, 6 peaks in the PIO_0 group, 6 peaks in the PIO_18 group, and 8 peaks in the QC group. So, xcms has found a feature that is L-methionine and is well aligned.
Is there more than one peak in features associated with methionine? We can see there are 45 peaks associated with L-methionine.
Was methionine detected in all samples? We can see that L-methionine was detected in all samples because this feature has 45 peaks (npeaks): 14 peaks in the CTR group, 5 peaks in the DIA_0 group, 6 peaks in the DIA_18 group, 6 peaks in the PIO_0 group, 6 peaks in the PIO_18 group, and 8 peaks in the QC group. This means this feature is in all the samples and therefore L-methionine.
The feature matrix we will create later (fmat) contains NA values for samples in which no chromatographic peak was detected in the feature’s m/z-rt region. We must fill in missing peaks in order to fill in the empty areas, and reduce the number of NA values in our matrix.
## We get missing values before filling in peaks
apply(featureValues(xdata, filled = FALSE), MARGIN = 2,
FUN = function(z) sum(is.na(z)))
We use the fillChromPeaks method to fill in intensity data for such missing values from the original files.
xdata <- fillChromPeaks(xdata)
## We use featureValues to extract the integrated peak intensity per feature/sample
## We get feature intensity matrix and its dimension
fmat <- featureValues(xdata, value = "maxo", method = "maxint")
dim(fmat)
## [1] 25417 45
The featureValues method returns a matrix with rows being features and columns samples. We have 25417 features and 45 samples
df.fmat <- data.frame(fmat)
myfeature.fmat <- df.fmat[803,]
df.myInfo <- data.frame(FT00803 = t(myfeature.fmat), GROUP = pheno$sample_group)
ggplot(data = df.myInfo, mapping = aes(x=df.myInfo$GROUP, y= df.myInfo$FT00803, fill = group_colors[raw_data$sample_group] )) +
geom_boxplot(alpha = 0.5) +
theme(legend.position="none")+
scale_y_continuous(name = "Intensity") +
scale_x_discrete(name = "Experimental group") +
ggtitle("L-methionine differences (in terms of feature intensities) according to experimental groups")
We can observe that the interquartile range is quite similar in all boxplots except in DIA_18 one. The IQR in experimental group DIA_18 is a lot larger than the rest of experimental groups. We note that experimental group DIA_18 boxplot varies quite a bit (much larger height of the boxplot; goes from 40000 to 70000). We have a bigger spread in DIA_18 boxplot than in the other experimental groups. The other experimental group boxplots are pretty condensed. That means they varies less; they are more consistent and easier to predict. CTR, DIA_18 and QC experimental groups boxplots are rather symmetric. The Q2 are in the center of the boxplot, while the skewness is particularly large in DIA_0, PIO_0 and PIO_18 boxplots. We can see two outliers beyond the whiskers of the CTR boxplot and one outlier beyond the whiskers of the DIA_0 boxplot and PIO_0 boxplot. These are some bigger values than the most. PIO_18 boxplot has one outlier below the whiskers, meaning it is a smaller value than the most. We may notice that PIO_18 boxplot is more or less at the same level of CTR boxplot (around 40000). In conclusion, the fact of having hyperinsulinaemic androgen excess (HIAE) leads to lower L-methionine intensity, while being treated with the polytherapy (PioFluMet) for 18 months restore L-methionine to similar levels found in healthy girls (CTR experimental group).